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Abstract 

Let {Xn}^=Q be a stationary real-valued time series with unknown 
distribution. Our goal is to estimate the conditional expectation of 
based on the observations Xi, < i < n in a strongly consistent 
way. Bailey and Ryabko proved that this is not possible even for 
ergodic binary time series if one estimates at all values of n. We 
propose a very simple algorithm which will make prediction infinitely 
often at carefully selected stopping times chosen by our rule. We show 
that under certain conditions our procedure is strongly (pointwise) 
consistent, and L2 consistent without any condition. An upper bound 
on the growth of the stopping times is also presented in this paper. 



1 Introduction 



Let {Xn}'^=Q be a real-valued time series. We are interested in estimating the 
random variable Xn+i given the past observations Xq, . . . , Xn- If the random 
variable Xn+i has finite expectation and we are to mimimize the conditional 
mean squared error then the solution is to choose the conditional expectation 
E[Xn+i\Xo, . . . , Xn) ■ Usually, the distribution is not known a priori. In this 
case we may try to estimate the above quantity from observations. 

Assume the distribution of the real- valued time series {Xn}'^=Q is station- 
ary. Now the goal is to estimate the conditional expectation E{Xn+i\XQ, . . . , Xn) 
from the data segment Xq, . . . , Xn such that the difference between the es- 
timate and the conditional expectation should tend to zero almost surely as 
the number of observations n tends to infinity. However [H] proved that 
there is no such estimator if one estimates for all values of n, even for all 
stationary and ergodic first order Markov chains taking values from the unit 
interval [0,1]. This problem was posed originally in [S]. [S] (applying the 
method of cutting and stacking developped in [22] and [27|) constructed a 
family of sationary and ergodic binary processes such that for any estimation 
scheme there was a process in his family for which the difference between the 
estimate and the true conditional expectation did not tend to zero. (Cf. [25] 
also.) 

However, for the class of all stationary and ergodic binary Markov chains 
of some finite order one can solve this problem. Indeed, if the time series 
is a Markov chain of some finite (but unknown) order, we can estimate the 
order (cf. [9], and [8]) and count frequencies of blocks with length equal to 
the order. 

In another special case, for certain Gaussian processes, [25] constructed 
an estimator such that for that family of processes the error between his 
estimator and the true conditional expectation tends to zero almost surely 
as the number of observations increases. 

Here we note that a totally different problem is when the goal is to esti- 
mate the conditional expectation in such a way that the time average of the 
squared error is required to vanish as the number of observations tends to 
infinity. This problem can be easily solved, cf. [5], [2T], [DISIE], [I9], [20], 
[13] and [12]. (See also [28] and [11].) 

In this paper the setting is different. We do not weaken the error criterion, 
that is, we will further consider the difference between our estimate and 
the true conditional expectation (rather than time averages) but we do not 
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require to estimate for every time instance n, but rather, merely along a 
stopping time sequence. That is, looking at the data segment Xq, . . . 
our rule will decide if we dare to estimate for this n or not, but anyhow we 
will definitely estimate for infinitely many n. 

Such algorithm was proposed for binary time series in [l7j but there the 
growth of the stopping times is like an exponential tower, and so that scheme 
is not feasible at all. A more practical algorithm was proposed in [TH] for 
certain binary time series. In this paper we provide an algorithm for real- 
valued processes. 

2 Definition of the Estimator and Main Re- 
sults 

For some technical reason, we will consider two-sided stationary real-valued 
processes {Xn}'^=_oo- Note that a one-sided stationary time series {^„}J^o 
can be extended to be a two-sided stationary time series 

For notational convenience, let = {Xm, ■ ■ ■ , Xn), where m < n. Let 
{Pfcjfclo denote a nested sequence of finite or countably infinite partitions of 
the real line by intervals. Let x [xY denote a quantizer that assigns to 
any point x G M the unique interval in Vk that contains x. For a set C C M 
let diam(C) = sup^^^gc* \^ ~ y\- We assume that 

lim diam([x]^) = for all x G M. (1) 

Let [X^]'^ = ([Xm]^, . . . , [X„]'^). Let 1 < < /c be a nondecreasing sequence 
of positive integers such that limfc^oo h = oo. Put 

J{n) = min{j > 1 : Ij^i > n}. 

Define the stopping times as follows. Set Co = 0. For k = 1,2, . . ., define the 
sequences ijk and (k recursively. Define 

Vi = min{t > : [X^^^^l^.^J' = [^^-(h-i)]'} Ci = Vi- 

Next we refine the quantization and look for the next occurrence of the block 
of length I2, namely 

r,2 = min{t > : [X<l%_,^J' = [X^^U,^_,/} and C2 = Ci + V2. 
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In general, we refine the quantization, and slowly increase the block length 
of the next repetition, as follows: 

rj, = min{t > : = [^."-(..-i)]'} ^nd = Ck-i+Vk- (2) 

One denotes the kth estimate of E{Xi^^^i\Xq'°) by g/^, and defines it to be 

^ fc-i 

9k = Tj2XQ+i- (3) 

j=0 

Let M be the set of all real numbers and put M*~ the set of all one-sided 
sequences of real numbers, that is, 

R*- = {(..., xo) : Xi e M for all -oo <i<0}. 

Define a metric on sequences (. . . , X-i, xq, ) and (. . . , yo) as follows. Let 

oo 1 1 

(For details see [TD| p. 51. ) 



Definition 1 (Almost surely continuous conditional expectation.) The 

conditional expectation E{Xi\ . . . ,X_i,Xq) is almost surely continuous if for 
some set C C R*^ which has probability one the conditional expectation 
E[Xi \ . . . , X_i, Xo) restricted to this set C is continuous with respect to met- 
ric d*{-, ■) in 

Example 1 A stationary and ergodic time series with almost surely contin- 
uous conditional expectation which is not continuous on the whole space. 

We will define a transformation S on the unit interval. Consider the 
binary expansion of each real-number r G [0, 1), that is, r = Xli^i^j^"*. 
When there are two expansions, use the representation which contains finitely 
many Vs. Now let 

r(r) = min{i > : rj = 1}. 
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Notice that, aside from the exceptional set {0}, which has Lehesgue measure 
zero T is finite and well-defined on the closed unit interval. The transforma- 
tion is defined by 

{1 ifO<i< T{r) 
tfi = r{r) (5) 
Tj if i> T{r). 

Notice that m fact, Sr = r - 2-^(^) + ^[=1 2-'. All iterations of S for 
— oo < A; < oo are well defined and invertible with the exception of the set 
of dyadic rationals which has Lehesgue measure zero. This transformation S 
could be defined recursively as 



, r-0.5 z/0.5<r<l 
^'=^ ./0<r<0.5. 



Now choose r uniformly on the unit interval. Set Xo(r) = r and put X„(r) = 
S'"r. Notice that the resulting time series {Xn} is a stationary and er- 
godic Markov chain with order one, cf. What more, one observa- 

tion determines the whole orbit of the process. Observe that £'(X„+i|Xq) = 
E{Xn+i\Xn) and E{Xn+i\Xn = x) = Sx. Since S is a continuous mapping 
disregarding the set of dyadic rationals, the resulting conditional expectation 
is almost surely continuous. However, the conditional expectation is not con- 
tinuous on the whole unit interval, since it can not be made continuous, for 
example, at 0.5. 



The next theorem estabhshes the strong (pointwise) consistency of the pro- 
posed estimator. 

Theorem 1 Let {Xn} be a real-valued stationary time series with i?(|Xop) < 
oo. For the estimator defined in Iji^ and for the stopping time Ca; defined 

in m, 



hm 



9k - E{Xc^^j^x\Xq 



a- 



almost surely 



provided that the conditional expectation E{X\\X^_^ is almost surely con- 
tinuous. 



The proof of Theorem [T] involves both the martingale convergence theorem 
and classical convergence results for an auxilliary sequence of orthogonal 
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random variables. The assumption on the almost sure continuity of the 
conditional expectation is crucial in going from the auxilliary variables to 
the actual random variables that take part in the estimator. 

The consistency holds independently of how the sequence Ik and the par- 
titions are chosen as long as 4 goes to infinity and the partitions become 
finer. However, the choice of these sequences has a great influence on the 
growth of the stopping times. 

From the proof of [5], [25] and [H] it is clear that even for the class of all 
stationary and ergodic binary time series with almost surely continuous con- 
ditional expectation E{Xi\ . . . ,X_i,Xq) one can not estimate £'(X„+i|Xq) 
for all n strongly (pointwise) consistently. 

Note that the processes constructed by the method of cutting and stacking 
(cf. [22j and [27j) are stationary processes with almost surely continuous 
conditional expectations. 

The stationary processes with almost surely continuous conditional ex- 
pectation generalize the processes for which the conditional expectation is 
actually continuous. (Cf. [15] or [16].) 

If one's goal is to estimate the conditional mean merely in L2 then the 
problem becomes very easy and even for all time instances one can esti- 
mate it, cf. [19j. We will prove that our proposed estimator {gn} along 
the stopping time sequence {Cn} is not just strongly consistent under the 
above mentioned continuity condition but also consistent in L2 without any 
continuity condition. The point here is that our scheme achieves two goals 
simultanously. In this way, if one runs our algorithm he can be sure that if 
the above mentioned continuity condition holds then the algorithm achieves 
strong consistency and if unfortunately that condition fails to hold then even 
in that case it achieves L2 consistency. Precisely: 

Theorem 2 Let be a real-valued stationary time series with E{\Xo\'^) < 
cxD. For the estimator defined in ^ and for the stopping time Ca: defined 
in ID, 



The next theorem gives an upper bound on the growth of the stopping times 
{Ca:} in case when finite partitions are used. 
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Theorem 3 Let {Xn} be a stationary real-valued time series. Assume Vk is 
a nested sequence of finite partitions of the real line by intervals. If for some 
e > 0, J2'k'=ii^ + 1)2^''=^ < oo then for the stopping time Cfc defined in ([^, 

eventually almost surely. 



Example 2 One may set e = 1, Ik = [3 logg k\ , and \Vk\ = \ 2^''\ where fk is 
an increasing sequence of positive real numbers tending to infinity arbitrary 
slowly. By TheoremlM, Cfc < k^^^~^-^''\ which is almost a polynomial growth. 

In case of finite alphabet processes you can achieve a slightly better upper 
bound than in Theorem [3l Indeed, let H denote the entropy rate associated 
with the stationary and ergodic finite alphabet time series cf. [?]. 

Note that in this case no quantization is needed. Then it is easy to see, that 
Ck < 2''-'^^+'^) eventualy almost surely provided that {k + 1)2"''=^ is summable. 

(Cf. [ig, [23], m-) 

If one desires to estimate X^ +i in L2 sense based on data Xq, . . . 
then the best he can do is to choose the conditional expectation 

g* = EiX^^jX^,^). 

Now we show that the conditional mean squared error — g^yiX^) 

with regard to gj is close to that of the best possible E{{X(^-^i — g*y\XQ) for 
large j. Indeed, this is an immediate consequence of Theorem [H Theorem |2l 
and the fact that - gjf\X^o') " ^((^C.+i - = - 9*jf- 



Corollary 1 Let {Xn} be a stationary real-valued time series. Assume E{\Xq 



< 



00. Then 



(8) 



in Li. Moreover, if in addition, the conditional expectation E{Xi\X[ 
almost surely continuous, then ^) holds almost surely. 



IS 
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Note that X„ can not be estimated for all n in such a way that the conditional 
mean squared error tend to zero in the pointwise sense even in case of almost 
surely continuous conditional expectation. (Cf. [5], [25], [H].) The main 
point here is that along a sequence of stopping times one can achieve that 
property. 

3 Auxiliary Results 

It will be useful to define other processes for > {xL^^}n=-oo as follows. 
Let 

= X(^,,-n for -oo < n < oo. (9) 

For an arbitrary real- valued stationary time series {1^}, for A; > let Co (^-oo) ~ 
and for all A; > 1 and 1 < i < k define 

and 

When it is obvious on which time series flfiY^^) and QiY^^) are evaluated, 
we will use the notation r)f and Q. Let T denote the left shift operator, that 
is, (Tx'^^)i = Xj+i. It is easy to see that if Ck{x'^oo) = ^ then (^(T'-x'^^) = -I. 
We will need the next lemmas for later use. 

Lemma 1 Let {Xn}'^=_ao be a real-valued stationary process. Then the time 
series {Xn^}'^^_^, {Xn}'^=_oo have identical distribution, that is, for all 
k>0,n>0,m>0, and Borel set F C M"+i, 

P((xi^l„, . . . eF)= F(X™_„ G F). 

Thus all the time series {Xn^}'i^^_^ for A; = 0, 1, . . . are stationary. 

Proof. Since the time series {Xn} is stationary and for all A; > 0, ri > 0, 

/ > 0, F C 

T\xlll2-n eF,a = l} = G F, C'aX'J = -/}, (10) 



7 



and by the construction in ([2]), we have 

oo 

= ^(4:i^n ^F) = Y. ^(4S™-n e F, a = 

1=0 

oo 

= E ^(^-- ^ ^' C'(^-oo) = -0 = Pi^l^l-n e F). 

The proof of the Lemma [T] is complete. 

For a given n, the partition cell is a random set and is varying as 

j ^ oo. However, we will prove that eventually it shrinks. 

Lemma 2 Let {Xn}^=^oo ^ real-valued stationary process. Then for all 
n>{], limj_>oo diam([X(^^. _„]■') = almost surely. 

Proof. Observe, that by the definition of stopping times in ([2]), for 
a given n, {[X(^--n]-'}f=ji^n) ^ decreasing sequence of intervals. Now, if for 
somej > J(n), diam([X^^. „„]■') < oo then limj^oo diam([X^._„]*) = 0. To seee 
this notice that if limj^oo diam([X(^._„]*) > then C\ilj{n)[-^(i-nY ^ind 
let z denote a real number from this set. For this z, limj^oo diam([2;]*) > 
contradicting our assumption in ([T]). What remains is to prove that 

P(diam[X^^,_„]^' = oo for all j > J{n) ) = 0. 

Indeed by Lemma [1] and assumption ([Tj) , 

P(diam([X^^._J^) = oo for all j > J(n)) 

< lim P(diam([X.,_„]-') = oo) = hm F(diam([li^]]-') = oo) 

J— >oo J— >oo 

= lim F(diam([X_„]^) = oo) = lim P(diam([Xi]^) = oo) = 0. 

jr— >00 J— >00 

The proof of Lemma [2] is complete. 
Define the time series {Xn}n=-Qo 

X-n = lim Xr.^n for n > 0, (11) 

where the limit exists since {[X(^.^ny}'^j(^n) ^ random sequence of nested 
intervals and by Lemma [2] their lengths tend to zero. 
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Lemma 3 Let {Xn}'^^_^ be a real-valued stationary process. Then the dis- 
tribution of {Xn}n=-oo GQuals the distribution of {Xn}n=-oo- 

Proof. By Lemma [1] it is enough to prove that for any i > 0, for all 
j > J{i), [X-iY = [X^^y. Let Rk be the set of right end-points of the right 
open intervals in the k-th partition, that is, 

Rk = {b eR : 3 - oo < a < b [a,b) eVk or 3 - oo < a < b {a,b) e Vk}- 

Similarly, let Lk be the set of left end-points of the left open intervals in the 
k-th partition, that is, 

Lk = {b eR : 3b < a < oo {b, a] e Vk or 3b < a < oo {b, a) e Vk}. 

If [X-iY = [X^_!!y fails for some j > J{i) then this must happen at some 
end point, that is, X_j G [J'^^q Rk or X_j G [JkLoLk- ( Since the partition 
sequence is a nested sequence and X^i = limj^oo X^l .) Therefore we can 
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estimate: By fllip . Lemma [21 and Lemma [H we have 
1 - F(l_, G [X^l^y for all j > J{i) ) 

oo 

< ^ ^ P{X_, = s, X^i] < X_i for all j > k) 

k=J{i) s&Rk 
oo 

+ ^ ^ P(l_i = s, X^^l > X_i for all j > k) 

k=J{i) seLk 
oo 

< y y lim P{s - diam([xi^?]^) < li^J < s) 
k=j(i) sefifc 



+ y y lim P{s < < s + diam{[X^_!>y)) 
' ^ ^ i^oo 

k=J(i) seLk 



y y lim P(s - diam([X_i]^) < X_i < s) 



J— >oo 
k=J{i) sGRk 



+ y y lim P(s < X_, < s + diam([X_i]^)) 



J— >oo 
k=J{i) sGLk 



y y lim P(s - diam([Xi]^) < Xi < s) 



J^OO 

A;=J(i) sefifc 



+ y y lim P(s < Xi < s + diam([Xi]^)) = 0. 

k=J(i) s£Lf;, 

The proof of Lemma [3] is complete. 

Now it is immediate that the time series {X„}°__j^ is stationary, since 
{X„}°^_^ is stationary, and it can be extended to be a two-sided time 
series {X„}^_q^. We will use this fact only for the purpose of defining the 
conditional expectation E[Xi\X^^). 



4 Proof of Theorem [T] 
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Proof. Define the function e : M* (—00, 00) as 

e{x'_J=E{X,\X'_^ = x'_ ^ 
Recall (BD and consider 



j=0 

lEr, + lx:^(^o+il^^oo)- (12) 



' k 

j=0 j=0 



Consider the first term and observe that {Tj} is a sequence of orthogonal 
random variables with ETj = and ^ (r^) < E ((Xi)^) < 00 since E (r^) < 
E ((X^^+i)^) and, by Lemma [1], X^^+i has the same distribution as Xi. Now 
by Theorem 3.2.2 in [24], 



^ fc-i 

— Tj almost surely. 

j=0 

(Alternatively, you can apply Theorem A6 in [llj) 

Now we deal with the second term. For arbitrary j > 0, by the constructions 

in 

lim c/*(X°^, (. . . , X^ll X^^^)) = almost surely. (13) 

By assumption, the function e(-) is continuous on a set C C M*^ with 
P{X°_^ G C) = 1. By Lemma [Hand Lemma [31 

P(X°^ G (. . . , X^^lx^o'^) G C for all j>0) = 1. (14) 
Now by the continuity of e(-) on the set C, and by ( |T3|) and ( |T4l) . 



i?(^C.+il^^oo) = e(. . . - e{XlJ = E{X^\X'_J. (15) 
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Thus gk almost surely. 

What remains to be proven is that almost surely, E{Xt;^+i\X^') 

For any set ^ C M let closure (A) denote the smallest closed subset of the 

real line containing A. Put 

5,(4^) = {z'_^ e R*- : G closure([Xc,.„z^,^,+i]^'), . . . , Zo E closure([X^J^')}. 

By ([2]), ([n]) and ([H]), almost surely, for all j, 

Put 



Hy'-J-e{z'_J\. 



Aj{X',^) = sup 
Now since e(-) is continuous at on set C and by (ITB]) and Lemma [H 



lim Aj{X^^) = almost surely. 



(17) 



By ([T7j) almost surely, 
lim sup 



Now consider 



< lim sup E 



eiX'_ 



\X^n' 



< limsup^ (a^ (Xo^O|4 



lim sup Aj- (4 ) = 0. 



(18) 



The first term it is a martingale and tends to e{X^^) by Theorem 7.6.2 in 
|4j) since by Lemma El E e(X°^) = E |e(X°^)| < |Xi| < oo, and 

is measurable with respect to cr{X^). The second term tends to zero by 
f fTSj) . The proof of Theorem [1] is complete. 
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5 Proof of Theorem 
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Proof. By Jensen's inequality, ([9]) and Lemma [H 



< E 



(^0+1 "^(^^^+ll^- 



+ \Y.e{\e{x^A....x%xII^)-e{x[ 



l[^-Wi-i)'---'^o^ 



j=0 



3=0 



+ e{\e{x\ 



.(fc)i 



-^(fc) -V^l'^-'l -V-l'^V V*-' 



(fc) 







.(fc)l^(fc) 



where is evaluated on {Xn^}\^_^. The first term converges to zero since 

$j = — is a sequence of orthogonal random variables 

with EQX^^+iH = ^(|Xop) < oo, and 



i=o 



A;2 



i=o 



(19) 

Applying ([9]) and Lemma [H one can estimate the sum of the last three terms 
by the sum 



k-i . 

limsup-5^E E{X,\X'_J - E{X,\[X'_ 



-(i,+i-l)J 



+ hmsupiX^ijf 
+ limsnpE (\e{Xi\X^_J - E{X,\X% 



fc— >oo V 
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where is now evaluated on {Xn}n=-oo- these terms converge to zero 

since \imj^^EiX,\X^_^) = E{X^\X^^J and limj^,^ EiX,\[X\^^_,^y+^) = 

E{Xi\X^^) in L2 by the martingale convergence theorem, cf. Theorem 
7.6.10 and Theorem 7.6.2 in [1], and thus the limit in fact exists and equals 
zero. The proof of Theorem [2] is complete. 



6 Proof of Theorem [3 



Proof. Let be the set of all two-sided sequences of real numbers, that 
is, 

= {(... , X-i,Xo, Xi, . . . ) : Xj G M for all —oo < i < oo}. 
Let 2/°/^.+! G Vj!' . Define the set Qk{y-if,+i) as follows: 

We will estimate the probability of Qk{y'^if,+i) by means of the ergodic the- 
orem. To do this apply the ergodic decomposition theorem, cf. [ID], and de- 
note the distribution according to the ergodic mode a; by P^^. Let x'^^ G 
be a typical sequence according to P^^. Define ao(?/°i = and for z > 1 
let 

a,{y\^,) = mm{l > a,-,{y\_,,) : T'^x^^ G Qk{y\+,)}- 
Define also /3o(?/°i^,+i) = and for « > 1 let 

= min{/ > + |P,|''^-2''^-^ : T^'x^^ G Qk{y\+,)}. 

Observe that for arbitrary z > 0, 

oo 
3=1 
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By Lemma [T] and ergodicity, 



oo 



1 



t oo 



< lim 



t-^^yy-ih+i-' i=i j=i 
t{k+l) {k + l) 



t^oo t|Pfc|'fc2'fc^ |PfcP'=2'fc^' 

Since the right hand side does not depend on a;, the same upper bound 
applies for the original stationary time series that is, 

F((^ ^ . , xr,xr. . . . ) ^ o.i/,,,)) < ^±Ji. 

By the construction in (El) -C,l{. ■ . = Ck{X^) we get 

P{UX^) > '=2''=^) 

= J2 ^((- • • ' ^i?, ^ • • • ) e Q,iy\^,)) <ik + 1)2-'-. 

,,0 p-p'fe 

By assumption, the right hand side sums, the Borel-Cantelli Lemma yields 
that (k < |Pfc|'''2''=^ eventually almost surely and Theorem [3] is proved. 
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